Multilingual Entity Linking: Comparing English and Spanish
نویسندگان
چکیده
The Entity Linking (EL) task is concerned with linking entity mentions in a text collection with their corresponding knowledgebase entries. The majority of approaches have focused on EL over English text collections. However, some approaches propose language-independent or multilingual approaches to perform EL over texts in many languages. In this paper, our goal is to see how well EL systems perform outside of the primary language (often English). We first provide a survey of EL approaches that present evaluation over multiple languages. We then provide results of an initial study comparing selected entity linking APIs for equivalent documents and sentences in English and Spanish. Multilingual EL approaches fare best for Spanish, though all approaches still perform better for English text than the corresponding Spanish text. This indicates that there is an important gap between EL techniques for English in relation to Spanish (and possibly for many other languages) which has not been addressed yet. However, we leave investigation of the causes of this gap for future work, which could be due to many factors, for example, to differences in existing multilingual knowledge bases.
منابع مشابه
Multilingual Event Detection using the NewsReader Pipelines
We describe a novel modular system for cross-lingual event extraction for English, Spanish,, Dutch and Italian texts. The system consists of a ready-to-use modular set of advanced multilingual Natural Language Processing (NLP) tools. The pipeline integrates modules for basic NLP processing as well as more advanced tasks such as cross-lingual Named Entity Linking, Semantic Role Labeling and time...
متن کاملCross-lingual Wikification Using Multilingual Embeddings
Cross-lingual Wikification is the task of grounding mentions written in non-English documents to entries in the English Wikipedia. This task involves the problem of comparing textual clues across languages, which requires developing a notion of similarity between text snippets across languages. In this paper, we address this problem by jointly training multilingual embeddings for words and Wiki...
متن کاملRPI BLENDER TAC-KBP2016 System Description
We used Stanford Corenlp toolkit (Manning et al., 2014b) for English name tagging. To extract name mentions from Chinese and Spanish documents, we use bi-directional LSTMs (Long Short Term Memory) networks which can leverage long distance features. The input of the networks are pretrained word embeddings and randomly generalized character embeddings. Both word embedding and character embeddings...
متن کاملNameTag(TM) Japanese and Spanish Systems as Used for MET
We have participated in the Multilingual Entity Task (MET) for Japanese and Spanish using SRA's multilingual text-indexing software called NameTag TM. Its English version was used for the Named Entity Task (NE) in MUC-6 [2]. The NameTag Japanese and Spanish systems were customized to accommodate the MET-specific requirements and were able to achieve high performance in both recall and precision.
متن کاملUNIBA: Combining Distributional Semantic Models and Sense Distribution for Multilingual All-Words Sense Disambiguation and Entity Linking
This paper describes the participation of the UNIBA team in the Task 13 of SemEval-2015 about Multilingual All-Words Sense Disambiguation and Entity Linking. We propose an algorithm able to disambiguate both word senses and named entities by combining the simple Lesk approach with information coming from both a distributional semantic model and usage frequency of meanings. The results for both ...
متن کامل